Crowd-Sourced Wrapper Construction with End Users

نویسنده

  • Steve Gardiner
چکیده

The web contains a tremendous number of data sets presented visually, which computers cannot currently read. Most people, however, understand the data sets with little difficulty, suggesting the potential for applying the techniques of crowd-sourcing to the problem of understanding web data sets. In this thesis we study several issues with respect to crowd-sourcing a collection of wrappers, or small programs mapping data sets to their logical structure, from a crowd of end users. We pay special attention to the majority of users who are not programmers. We present a prototype system, Mixer, that allows end users to demonstrate and execute repetitive ad hoc data retrieval actions over multiple data sources. The evaluation of the prototype suggests that end users, under the strong assumption that input to the individual query systems, as well as their output, is fully understood, are able to construct and combine data from multiple data sources. Furthermore we present another prototype system, SmartWrap showing that end users, explicitly including non programmers, can demonstrate actions sufficient to construct for a data set a wrapper, i.e. the instructions needed to understand the data set. A pilot crowd is able to construct wrappers for most requested data sets, but gives no guidance that the wrapped data sets are useful or relevant to anyone. To narrow the search for relevant data sets we turn to an audience that theory predicts will make use of additional structure in web pages: blind people. We present the theory and the results of a preliminary study demonstrating the concrete benefits non visual users of the web stand to gain from increased structure in web pages and more specifically from the introduction of web tables in place of template-driven visual data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Volunteered Human Sensor Data into Crowd-sourced Platforms: A Use Case on Noise Pollution Monitoring and OpenStreetMap

Local and national governments deploy environmental monitoring to improve decision making for sustainable development. Especially the monitoring and management of pollution such as noise is of local, national and European relevance, as expressed by the European Noise Directive (END) (European Parliament, 2002). The European Environment Agency (EEA) asks all its member and collaborating countrie...

متن کامل

Crowdsourced pedestrian map construction for short-term city-scale events

This paper targets the construction of pedestrian maps for city-scale events from GPS trajectories of visitors. Incomplete data with a short lifetime, varying localisation accuracy, and a high variation of walking behaviour render the extraction of a pedestrian map from crowd-sourced data a difficult task. Traditional network or map construction methods lean on accurate GPS trajectories typical...

متن کامل

Pair Me Up: A Web Framework for Crowd-Sourced Spoken Dialogue Collection

We describe and analyze a new web-based spoken dialogue data collection framework. The framework enables the capture of conversational speech from two remote users who converse with each other and play a dialogue game entirely through their web browsers. We report on the substantial improvements in the speed and cost of data capture we have observed with this crowd-sourced paradigm. We also ana...

متن کامل

SmartRoad: A Mobile Phone Based Crowd-Sourced Road Sensing System

In this paper we describe SmartRoad, a road sensing system that generates and collects mobile sensory data from vehicle-resident mobile phones, in enabling and supporting crowd-sourced road sensing applications and services, as an alternative to expensive road surveys conducted traditionally. We implement the SmartRoad prototype system, and deploy it on 35 volunteer users’ vehicles for 2 months...

متن کامل

N Heads Are Better Than One

Social network platforms have transformed how people communicate and share information. However, as these platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy sett...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015